Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(x/bank): Replace regex parsing of denom validation to generated parsing #19511

Merged
merged 20 commits into from
Mar 7, 2024

Conversation

mattverse
Copy link
Contributor

@mattverse mattverse commented Feb 21, 2024

Description

Closes: #17221

Replaces Regex Usage when parsing denom for validation. Instead uses generated code for the validation using ragel( https://en.wikipedia.org/wiki/Ragel).

As linked in original issue, using regex for denom validation has a huge performance overhead, thus replacing it with direct matching methods.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features
    • Introduced a ValueCodec for enhanced type handling in collections.
    • Added a new comparison method IsGT for types.Coin instances.
    • Implemented core/router.Service to improve runtime service handling.
  • Refactor
    • Updated coin denomination validation to use direct matching methods, enhancing validation accuracy.
    • Enhanced ParseDecCoin function for more efficient parsing of decimal coins.
  • Tests
    • Modified integration tests to use a custom regular expression for coin denominations, ensuring broader test coverage.

@mattverse mattverse requested a review from a team as a code owner February 21, 2024 09:14
Copy link
Contributor

coderabbitai bot commented Feb 21, 2024

Walkthrough

Walkthrough

The recent update focuses on enhancing the efficiency and reliability of coin denomination validation and parsing within the Cosmos SDK. By replacing regex parsing with direct matching methods and introducing a new codec and comparison method, the changes aim to optimize runtime performance and improve the codebase's maintainability.

Changes

Files Change Summary
types/coin.go
types/coin_regex.go
Replaced regex parsing with direct matching methods for denomination validation.
types/coin_regex.go
types/.../coin_regex.rl
Added functionality for parsing and matching coin denominations using regular expressions and Ragel.
types/dec_coin.go Enhanced ParseDecCoin function for better parsing accuracy.
.../bank/keeper/deterministic_test.go Updated denomination regex for testing purposes.
core/router.Service Implemented in runtime for service routing.
collections/maps.go Added ValueCodec for math.Uint type.

Assessment against linked issues

Objective Addressed Explanation
Enhance coin denom matching speed and shift work to compile time rather than runtime. (#17221)
Replace regexp parsing with more direct methods that are functionally equivalent. (#17221)
Validate that denom strings adhere to a specified regex pattern when writing metadata. (#12026) The changes focus on parsing and validation mechanisms, but do not address validation in InitGenesis specifically.

Related issues

  • Issue Only allow valid denoms in bank's denom Metadata #12026 could be linked to this PR because:
    • The PR introduces improvements in denomination validation which is a step towards ensuring only valid denominations are allowed, addressing part of the issue's objectives regarding validation of denom strings.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 0cf0c28 and 2c040a4.
Files selected for processing (6)
  • types/coin.go (2 hunks)
  • types/coin_regex.go (1 hunks)
  • types/coin_regex.rl (1 hunks)
  • types/coin_test.go (1 hunks)
  • types/dec_coin.go (1 hunks)
  • types/dec_coin_test.go (1 hunks)
Files not summarized due to errors (1)
  • types/coin_regex.go: Error: Message exceeds token limit
Additional comments: 54
types/coin_regex.rl (2)
  • 3-21: The MatchDenom function is well-implemented using Ragel, providing a more efficient alternative to regex-based parsing for coin denominations. The use of character classes and patterns is clear and concise. However, it's important to ensure comprehensive test coverage for various valid and invalid denomination strings to validate the correctness of this parsing logic.
  • 24-66: The MatchDecCoin function introduces a more complex parsing logic to handle decimal coin amounts and denominations. The use of actions to mark the start and end of amounts and denominations is a good practice, allowing for precise extraction. However, similar to MatchDenom, it's crucial to verify the correctness through extensive testing, especially for edge cases and invalid inputs.
types/dec_coin.go (1)
  • 634-642: The update to ParseDecCoin to use MatchDecCoin for parsing is a significant improvement in terms of performance. By avoiding regex and using specific byte positions for amount and denomination extraction, the function becomes more efficient. Ensure that this new parsing logic is thoroughly tested, particularly for handling edge cases and ensuring that the extracted positions accurately reflect the intended parts of the input string.
types/coin.go (1)
  • 846-852: The replacement of regex-based validation with a Ragel-generated function (MatchDenom) in the ValidateDenom function is a significant change aimed at improving performance by reducing runtime overhead. This approach aligns well with the PR's objectives of enhancing efficiency and maintainability by moving away from regex for denom validation.
  • Performance: The shift to a compile-time generated parsing method should indeed offer substantial performance improvements over the regex-based approach, especially in scenarios where denom validation is a frequent operation.
  • Maintainability: While the performance benefits are clear, it's important to consider the maintainability of the Ragel-generated code. Generated code can sometimes be harder to read or modify than hand-written code, depending on how it's structured and documented. It would be beneficial to ensure that there's sufficient documentation on how to regenerate this code if the validation rules need to change in the future.

Overall, this change appears to be a positive step towards optimizing the Cosmos SDK's performance and codebase clarity. However, ensuring the maintainability of the Ragel-generated code is crucial for future modifications and understanding of the validation logic.

types/coin_regex.go (2)
  • 5-448: The MatchDenom function is well-structured and follows the expected pattern for a Ragel-generated state machine. It's designed to validate coin denominations efficiently. However, it's crucial to ensure that the generated code aligns with the intended patterns for denominations. Since this is machine-generated, it's assumed to be correct based on the Ragel specification provided. The function returns a boolean indicating whether the input matches the denomination pattern, which is a straightforward and expected behavior.
  • 451-963: The MatchDecCoin function is designed to parse decimal coin amounts and extract the amount and denomination parts. It's more complex than MatchDenom due to the need to capture specific parts of the input. The function returns four values: amountStart, amountEnd, denomEnd, and isValid, which are used to extract the amount and denomination from the input.
  1. The use of -1 as the initial value for amountStart, amountEnd, and denomEnd is a good practice for indicating that these positions have not been found.
  2. The isValid flag is correctly set based on the state machine's final state, ensuring that only valid decimal coin strings are accepted.
  3. The adjustments made in the EOF actions (e.g., adjusting denomEnd to exclude space if present) are thoughtful and necessary for correct parsing.

Overall, the function appears to be correctly implemented according to the Ragel specification. It's important to have comprehensive tests to ensure that all edge cases are handled correctly, especially given the complexity of parsing decimal coin amounts.

types/dec_coin_test.go (1)
  • 380-382: The added test case for parsing decimal coins with a value of 5.5 atoms is correctly implemented and follows the established testing pattern in the file. It's good to see that the test case explicitly checks for the expected outcome, ensuring that the parsing logic handles decimal values correctly. This test is crucial for validating the changes made to the parsing logic in the Cosmos SDK.
types/coin_test.go (47)
  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]

The package declaration and imports are correctly structured, adhering to Go conventions. It's good practice to group standard library imports separately from third-party libraries, which is followed here.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [11-11]

The global variables testDenom1 and testDenom2 are well-named, indicating their purpose as denominations for test coins. This enhances readability and maintainability of the test suite.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [13-13]

The coinTestSuite struct is well-organized, containing fields for various test coins and a field for empty coins. This setup facilitates the reuse of test data across different test cases, promoting the DRY (Don't Repeat Yourself) principle.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [15-15]

The TestCoinTestSuite function correctly initializes the test suite using the suite.Run function from the testify/suite package. This approach leverages the setup and teardown hooks provided by the testify suite, allowing for more organized and manageable test code.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [17-17]

The SetupSuite method is efficiently setting up test coins for use in the suite's test cases. The use of math.NewInt and sdk.NewCoin ensures that the test data is accurately represented and easily modifiable.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [21-21]

The TestCoin method correctly tests the creation of coins with various denominations and amounts, including negative amounts to ensure that the coin creation logic correctly panics in such cases. This is crucial for maintaining the integrity of coin operations within the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [23-23]

The TestCoin_String method effectively tests the string representation of a coin, ensuring that it matches the expected format. This is important for user interfaces and logging, where a clear and accurate representation of coins is necessary.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [25-25]

The TestIsEqualCoin method thoroughly tests the equality comparison between coins, covering various scenarios including different denominations and amounts. This ensures the reliability of coin comparisons, which are fundamental to many operations in the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [27-27]

The TestCoinIsValid method tests the validity of coins with various denominations, including edge cases like very long denominations and special characters. This comprehensive testing is essential for ensuring that only valid coins are accepted by the system.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [29-29]

The TestCoinsDenoms method correctly tests the extraction of denominations from a set of coins, ensuring that the functionality works as expected even in edge cases. This is important for operations that need to work with the denominations of coins directly.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [31-31]

The TestAddCoin method provides a thorough test of the addition operation for coins, including checks for overflows. This is crucial for ensuring the accuracy and safety of coin arithmetic operations within the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [33-33]

The TestAddCoinAmount method tests the addition of a raw amount to a coin, covering both positive and zero addition cases. This ensures that coin amounts can be accurately and safely modified.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [35-35]

The TestSubCoin method tests the subtraction operation between coins, including cases that should panic due to invalid operations. This ensures the robustness and safety of coin subtraction within the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [37-37]

The TestSubCoinAmount method tests the subtraction of a raw amount from a coin, including cases that should panic due to resulting in negative amounts. This is important for maintaining the integrity of coin values.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [39-39]

The TestMulIntCoins method tests the multiplication of a set of coins by an integer, including a case that should panic due to overflow. This ensures that coin values can be accurately scaled up while maintaining system integrity.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [41-41]

The TestQuoIntCoins method tests the division of a set of coins by an integer, including cases that should panic due to division by zero. This ensures that coin values can be accurately scaled down in a safe manner.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [43-43]

The TestIsGTCoin method tests the greater-than comparison between coins, including cases that should panic due to invalid comparisons. This ensures the reliability of coin comparisons, which are essential for decision-making logic in the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [45-45]

The TestIsGTECoin method tests the greater-than-or-equal-to comparison between coins, including cases that should panic due to invalid comparisons. This ensures the accuracy of coin comparisons, supporting various operations and logic within the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [47-47]

The TestIsLTECoin method tests the less-than-or-equal-to comparison between coins, including cases that should panic due to invalid comparisons. This is important for ensuring the correctness of comparisons, which are fundamental to many SDK operations.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [49-49]

The TestIsLTCoin method tests the less-than comparison between coins, including cases that should panic due to invalid comparisons. This ensures the robustness of coin comparison logic, which is crucial for various SDK functionalities.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [51-51]

The TestCoinIsZero method correctly tests whether a coin's amount is zero, covering both true and false cases. This functionality is essential for operations that need to distinguish between zero and non-zero coin amounts.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [53-53]

The TestCoinIsNil method tests the nil-check functionality for coins, ensuring that both uninitialized coins and coins with a denomination but no amount are considered nil. This is important for handling uninitialized or partially initialized coins correctly.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [55-55]

The TestFilteredZeroCoins method tests the filtering of zero-amount coins from a set, ensuring that only coins with non-zero amounts are retained. This is crucial for operations that should ignore zero-amount coins.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [57-57]

The TestCoins_String method tests the string representation of a set of coins, ensuring that it matches the expected format. This is important for displaying coin sets in user interfaces and logs.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [59-59]

The TestIsZeroCoins method tests whether a set of coins contains only zero-amount coins, covering various scenarios. This functionality is essential for operations that need to treat zero-amount coin sets differently from non-zero ones.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [61-61]

The TestEqualCoins method thoroughly tests the equality comparison between sets of coins, ensuring that the comparison logic is accurate and reliable. This is fundamental to many operations within the SDK that rely on comparing coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [63-63]

The TestAddCoins method tests the addition of two sets of coins, including cases that should result in zero coins being removed from the result. This ensures the accuracy and safety of adding coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [65-65]

The TestCoinsAddCoalescesDuplicateDenominations method tests the addition of coin sets with duplicate denominations, ensuring that duplicates are correctly coalesced. This is crucial for maintaining the integrity of coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [67-67]

The TestSubCoins method tests the subtraction of one set of coins from another, including cases that should panic due to invalid operations. This ensures the robustness and safety of coin set subtraction within the SDK.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [69-69]

The TestSafeSubCoin method tests the safe subtraction of one coin from another, including cases that should result in an error due to negative results. This is important for ensuring that coin subtraction is performed safely and accurately.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [71-71]

The TestCoins_Validate method tests the validation logic for sets of coins, including various invalid scenarios. This comprehensive testing is essential for ensuring that only valid coin sets are accepted by the system.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [73-73]

The TestMinMax method tests the calculation of the minimum and maximum between two sets of coins, ensuring that the functionality works as expected. This is important for operations that need to compare coin sets to determine bounds.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [75-75]

The TestCoinsGT method tests the greater-than comparison between sets of coins, ensuring that the comparison logic is accurate. This is crucial for decision-making logic within the SDK that relies on comparing coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [77-77]

The TestCoinsLT method tests the less-than comparison between sets of coins, ensuring that the comparison logic is accurate. This is important for various SDK operations that depend on comparing coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [79-79]

The TestCoinsLTE method tests the less-than-or-equal-to comparison between sets of coins, ensuring that the comparison logic is accurate. This supports the correctness of operations that rely on comparing coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [81-81]

The TestParseCoins method tests the parsing of coin strings into sets of coins, covering various scenarios including invalid inputs. This ensures that coin strings can be accurately and safely converted into coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [83-83]

The TestSortCoins method tests the sorting functionality for sets of coins, ensuring that coins are correctly sorted and that invalid coins are handled appropriately. This is crucial for maintaining the order and integrity of coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [85-85]

The TestSearch method tests the search functionality within sets of coins, ensuring that coins can be accurately found or determined to be absent. This supports operations that need to query coin sets for specific denominations.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [87-87]

The TestCoinsIsAnyGTE method tests whether any coin in one set is greater than or equal to any coin in another set, covering various scenarios. This is important for operations that need to compare coin sets in a non-strict manner.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [89-89]

The TestCoinsIsAllGT method tests whether all coins in one set are strictly greater than all coins in another set, ensuring the accuracy of this comparison logic. This supports the correctness of operations that rely on strict comparisons between coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [91-91]

The TestCoinsIsAllGTE method tests whether all coins in one set are greater than or equal to all coins in another set, ensuring the accuracy of this comparison logic. This is crucial for operations that rely on non-strict comparisons between coin sets.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [93-93]

The TestNewCoins method tests the creation of new coin sets, including cases that should panic due to invalid inputs. This ensures that coin sets are correctly initialized and validated upon creation.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [95-95]

The TestCoinsIsAnyGT method tests whether any coin in one set is strictly greater than any coin in another set, covering various scenarios. This is important for operations that need to compare coin sets in a specific manner.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [97-97]

The TestCoinsIsAnyNil method tests whether any coin in a set is considered nil, ensuring that sets with uninitialized or partially initialized coins are correctly identified. This is crucial for handling coin sets safely.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [99-99]

The TestMarshalJSONCoins method tests the JSON marshaling and unmarshaling of coin sets, ensuring that coin sets can be accurately represented in JSON format and reconstructed from it. This is important for data serialization and communication.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [101-101]

The TestCoinValidate method tests the validation logic for individual coins, including various invalid scenarios. This comprehensive testing is essential for ensuring that only valid coins are accepted by the system.

  • 106-111: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [103-103]

The TestCoinAminoEncoding method tests the Amino encoding and decoding of individual coins, ensuring that coins can be accurately serialized and deserialized using the Amino codec. This is important for data persistence and communication.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 2c040a4 and d493fc6.
Files selected for processing (1)
  • types/coin_regex.go (1 hunks)
Files not summarized due to errors (1)
  • types/coin_regex.go: Error: Message exceeds token limit
Additional comments: 2
types/coin_regex.go (2)
  • 8-455: The MatchDenom function is well-structured and follows the generated pattern from Ragel for state machine-based parsing. The function aims to validate coin denominations using a finite state machine, which is a more efficient approach compared to regex parsing. The transition arrays and logic are generated by Ragel, ensuring correctness as long as the .rl file is correctly defined. The function returns a boolean indicating whether the input matches the denom pattern, which aligns with the PR's objective to improve performance and maintainability by replacing regex-based validation.
  • 458-972: The MatchDecCoin function introduces a more complex state machine to parse decimal coin amounts, capturing the start and end positions of the amount and denomination. This function is a significant improvement in terms of performance and precision over regex-based parsing for decimal coins. The use of action codes to mark the start and end of the amount and to adjust the end position of the denomination is a clever use of Ragel's capabilities. However, it's crucial to ensure that the .rl file accurately defines the expected patterns to avoid any parsing inaccuracies. The function returns the start and end positions for the amount and denomination, along with a validity flag, which is a practical approach for downstream processing.

types/coin.go Outdated Show resolved Hide resolved
@facundomedica
Copy link
Member

I'm not familiar with Ragel, could we get a small benchmark to compare? Also, how is the code generated? Thank you!

@mattverse
Copy link
Contributor Author

@facundomedica of course! I'll get the benchmark tomorrow and reply in the PR

types/coin.go Outdated Show resolved Hide resolved
types/coin.go Outdated Show resolved Hide resolved
types/coin_regex.rl Show resolved Hide resolved
types/coin_regex.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between d493fc6 and fac684e.
Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • types/coin.go (2 hunks)
  • types/coin_regex.go (1 hunks)
  • types/coin_regex.rl (1 hunks)
Files not summarized due to errors (1)
  • types/coin_regex.go: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (2)
  • types/coin.go
  • types/coin_regex.rl
Additional comments: 4
CHANGELOG.md (4)
  • 45-45: The entry for replacing regex parsing with direct matching methods for denom validation accurately reflects the PR's main objective. It's clear and directly links to the PR, facilitating easy reference.
  • 45-45: The addition of a ValueCodec for the math.Uint type in collections maps is mentioned. This change seems unrelated to the main focus of the PR but is a valuable addition to the SDK. Ensure that this change has been documented in the appropriate sections of the SDK documentation for developers to understand its usage and benefits.
Verification successful

The search results indicate that the documentation, specifically within the collections/README.md file, has been updated to mention "math.Uint" multiple times. This suggests that the addition of a ValueCodec for the math.Uint type in collections maps, as mentioned in the review comment, is likely documented, albeit not with the explicit use of the term "ValueCodec". Given the context and the findings, it appears that the documentation has been updated in relation to the change, but the specific terminology used in the review comment may not have been directly matched in the documentation.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for documentation updates related to ValueCodec for math.Uint
rg "ValueCodec for math.Uint" docs/

Length of output: 37



Script:

#!/bin/bash
# Broaden the search for documentation updates related to ValueCodec and math.Uint separately
rg "ValueCodec" -or "math.Uint" docs/
# Additionally, search for any mention of ValueCodec or math.Uint in README files or developer guides that might not be in the docs/ directory
rg "ValueCodec" -or "math.Uint" --glob "*README.md" --glob "*developer_guide*"

Length of output: 400

* 45-45: The introduction of the `IsGT` method for `types.Coin` is correctly documented. This method enhances the SDK's functionality by providing a straightforward way to compare coin values. It's important to ensure that examples of using `IsGT` are included in the SDK documentation to help developers understand its application. * 45-45: Adding the `--qrcode` flag to the `keys show` command is a user-friendly feature that enhances the usability of the SDK. This change should be highlighted in the SDK's user guide or command-line tool documentation to ensure users are aware of how to use this new feature.

types/coin_regex.go Show resolved Hide resolved
types/coin_regex.go Outdated Show resolved Hide resolved
types/coin_regex.go Outdated Show resolved Hide resolved
@mattverse
Copy link
Contributor Author

mattverse commented Feb 22, 2024

Finished benchmarking! Bench marked using existing benchmarking test https://github.com/cosmos/cosmos-sdk/blob/main/types/bench_test.go#L18 on parsing coins.

Here are the results:

$ benchstat old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types
             │   old.txt   │              new.txt               │
             │   sec/op    │   sec/op     vs base               │
ParseCoin-12   4.173µ ± 2%   3.846µ ± 1%  -7.83% (p=0.000 n=10)

             │   old.txt    │               new.txt                │
             │     B/op     │     B/op      vs base                │
ParseCoin-12   2.156Ki ± 0%   1.719Ki ± 0%  -20.29% (p=0.000 n=10)

             │  old.txt   │              new.txt               │
             │ allocs/op  │ allocs/op   vs base                │
ParseCoin-12   71.00 ± 0%   63.00 ± 0%  -11.27% (p=0.000 n=10)

We can see

  • 7% decrease in operation time
  • average of approx 20% improvement on memory usage
  • approx 11% improvement on heap allocation !

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between fac684e and 54e29ff.
Files selected for processing (2)
  • types/coin_regex.go (1 hunks)
  • types/coin_regex.rl (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • types/coin_regex.rl
Additional comments: 2
types/coin_regex.go (2)
  • 2-5: The file header clearly states that this is generated code and should not be edited manually. It also specifies the source file and disables certain linter warnings. This is a good practice for generated code to ensure maintainability and clarity about its generation and purpose.
  • 8-15: The comments explaining the previous regex patterns provide valuable context for understanding the transition from regex to Ragel. Including a brief comment about the choice of Ragel over regex, as suggested in the existing comments, is beneficial for future maintainers.

types/coin_regex.go Outdated Show resolved Hide resolved
types/coin_regex.go Outdated Show resolved Hide resolved
@ValarDragon
Copy link
Contributor

I think the speed improvement is actually way higher. The IBC hashes are going to be much longer, but this isn't in the coin benchmarks

Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between using regal over a simple go loop since we know the input. Bez mentioned that it is recommended to move away from regular expression once there is understanding of the format.

@mattverse would love to better understand the reasoning behind using regal over native golang

@ValarDragon
Copy link
Contributor

Ragel is native golang, its just code generated. I think its worth benchmarking against the for loop I put in the GH issue as well though

@facundomedica
Copy link
Member

I just benchmarked only the ValidateDenom, first regex vs loop:

benchstat 20runs_regex.txt 20runs_loop.txt
goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types
                │ 20runs_regex.txt │           20runs_loop.txt           │
                │      sec/op      │   sec/op     vs base                │
ValidateDenom-8       1913.5n ± 0%   484.4n ± 1%  -74.69% (p=0.000 n=20)

                │ 20runs_regex.txt │        20runs_loop.txt         │
                │       B/op       │    B/op     vs base            │
ValidateDenom-8         0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=20) ¹
¹ all samples are equal

                │ 20runs_regex.txt │        20runs_loop.txt         │
                │    allocs/op     │ allocs/op   vs base            │
ValidateDenom-8         0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=20) ¹
¹ all samples are equal

Now loop vs ragel:

benchstat 20runs_loop.txt 20runs_ragel.txt 
goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types
                │ 20runs_loop.txt │          20runs_ragel.txt           │
                │     sec/op      │   sec/op     vs base                │
ValidateDenom-8       484.4n ± 1%   124.8n ± 1%  -74.25% (p=0.000 n=20)

                │ 20runs_loop.txt │       20runs_ragel.txt       │
                │      B/op       │    B/op     vs base          │
ValidateDenom-8          0.0 ± 0%   160.0 ± 0%  ? (p=0.000 n=20)

                │ 20runs_loop.txt │       20runs_ragel.txt       │
                │    allocs/op    │ allocs/op   vs base          │
ValidateDenom-8        0.000 ± 0%   2.000 ± 0%  ? (p=0.000 n=20)

Used these denoms:

var denoms = []string{
	"ATM",
	"AMX",
	"XXX",
	"BTC",
	"ibc/27394FB092D2ECCD56123C74F36E4C1F926001CEADA9CA97EA622B25F41E5EB2",
	"ibc/498A0751C798A0D9A389AA3691123DADA57DAA4FE165D5C75894505B876BA6E4",
	"uatom",
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 54e29ff and fe75550.
Files selected for processing (3)
  • types/coin_regex.go (1 hunks)
  • types/coin_regex.rl (1 hunks)
  • types/dec_coin.go (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • types/coin_regex.go
  • types/coin_regex.rl
  • types/dec_coin.go

@mattverse
Copy link
Contributor Author

@facundomedica Amazing! Thanks so much 🙇 So is the claim loop shows approx 75% decrease then regel showing another 75% decrease from there? Also curious about other stats!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 8bc1443 and e81f4c9.
Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md

return fmt.Errorf("invalid denom: %s", denom)
if reDnm == nil || reDecCoin == nil {
// Convert the string to a byte slice as required by the Ragel-generated function.
denomBytes := []byte(denom)
Copy link
Contributor

@ValarDragon ValarDragon Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should do internal.unsafeconvstringtobytes

That should fix the allocations in this fn

Copy link
Contributor

@ValarDragon ValarDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Nice job! Looking forward to this dropping out of profiles

@@ -0,0 +1,39 @@
// Code generated by regel using `ragel -Z coin_regex.rl`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is misleading, this code wasn't generated right? It was written and then coin_regex.go was generated from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here goes directly to the generated code( coin_regex.go ). Changed in the most recent commit to be more clear!

@tac0turtle
Copy link
Member

once linting passes we can merge. Thank you spending time on this

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between e81f4c9 and 7018ae4.
Files selected for processing (3)
  • types/coin.go (2 hunks)
  • types/coin_regex.go (1 hunks)
  • types/coin_regex.rl (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • types/coin.go
  • types/coin_regex.go
  • types/coin_regex.rl

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 7018ae4 and c4e1abe.
Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • tests/integration/bank/keeper/deterministic_test.go (1 hunks)
  • tests/integration/staking/keeper/deterministic_test.go (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • CHANGELOG.md
  • tests/integration/bank/keeper/deterministic_test.go
  • tests/integration/staking/keeper/deterministic_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between c4e1abe and 1a87ce7.
Files selected for processing (1)
  • types/coin_regex.go (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • types/coin_regex.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: .coderabbit.yml

Commits Files that changed from the base of the PR and between 1a87ce7 and 97b691f.
Files selected for processing (1)
  • .golangci.yml (1 hunks)
Files skipped from review due to trivial changes (1)
  • .golangci.yml

@facundomedica facundomedica added this pull request to the merge queue Mar 7, 2024
Merged via the queue into cosmos:main with commit a1e3a85 Mar 7, 2024
57 of 60 checks passed
mattverse added a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 8, 2024
mattverse added a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 13, 2024
mattverse added a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 13, 2024
czarcas7ic added a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 16, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
mergify bot pushed a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 16, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
(cherry picked from commit ab4bc05)
mergify bot pushed a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 16, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
(cherry picked from commit ab4bc05)
czarcas7ic pushed a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 16, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
(cherry picked from commit ab4bc05)

Co-authored-by: Matt, Park <[email protected]>
czarcas7ic pushed a commit to osmosis-labs/cosmos-sdk that referenced this pull request Mar 16, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
(cherry picked from commit ab4bc05)

Co-authored-by: Matt, Park <[email protected]>
czarcas7ic added a commit to osmosis-labs/cosmos-sdk that referenced this pull request May 9, 2024
* feat(x/bank): Replace regex parsing of denom validation to generated parsing (cosmos#19511)

* fix: fix ragel denom validation introduced in cosmos#19511 (cosmos#19701)

* fix: use loop instead of ragel (cosmos#19710)

* Fix test

* Fix integration test as well

* Try linting

* Fix lint

* Fix lint

* add changelog

---------

Co-authored-by: Marko <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Co-authored-by: Adam Tucker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Move away from Regexp parsing of denom validity / decimal amounts
6 participants